Scalability of Learning Arbiter and Combiner Trees from Partitioned Data

نویسندگان

  • Philip K. Chan
  • Salvatore J. Stolfo
چکیده

Much of the research in inductive learning concentrates on problems with relatively small amounts of data residing at one location. In this paper we explore the scalability of learning arbiter and combiner trees from partitioned data. Arbiter and combiner trees integrate classiiers trained in parallel from small disjoint subsets. Previous work demonstrated their eecacy in terms of accuracy, this paper discusses their performance in terms of speedup and scalability. The performance of serial learning algorithms is evaluated. The performance of the algorithms used to construct combiner and arbiter trees in parallel is then analyzed. Our empirical results indicate that the techniques can eeectively scale up to large datasets with millions of records.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalability of Hierarchical Meta-learning on Partitioned Data

In this paper we study the issue of how to scale machine learning algorithms, that typically are designed to deal with main-memory based datasets, to eeciently learn models from large distributed databases. We have explored an approach called meta-learning that is related to the traditional approaches of data reduction commonly employed in distributed database query processing systems. We explo...

متن کامل

Learning Arbiter and Combiner Trees from Partitioned Data for Scaling Machine Learning

Knowledge discovery in databases has become an increasingly important research topic with the advent of wide area network computing. One of the crucial problems we study in this paper is how to scale machine learning algorithms, that typically are designed to deal with main memory based datasets, to efficiently learn from large distributed databases. We have explored an approach called meta-lea...

متن کامل

A Study of Meta-Learning in Ensemble Based Classifier

-The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is wellknown that ensemble methods can be used for improving prediction performance. Researchers from various disciplines such as statistics and AI considered the use of ensemble methodology. Meta-learning is a technique that seeks to compute higher-level classifiers (or classification models), c...

متن کامل

دسته‌بندی داده‌های دورده‌ای با ابرمستطیل موازی محورهای مختصات

One of the machine learning tasks is supervised learning. In supervised learning we infer a function from labeled training data. The goal of supervised learning algorithms is learning a good hypothesis that minimizes the sum of the errors. A wide range of supervised algorithms is available such as decision tress, SVM, and KNN methods. In this paper we focus on decision tree algorithms. When we ...

متن کامل

{24 () Parallel Formulations of Decision-tree Classiication Algorithms

Classiication decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classiication decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classiication decision trees have a natural concurrency, but are diicult to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007